Structure and randomness in combinatorics 
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Abstract 

Combinatorics, like computer science, often has to deal 
with large objects of unspecified (or unusable) structure. 
One powerful way to deal with such an arbitrary object is to 
decompose it into more usable components. In particular, 
it has proven profitable to decompose such objects into a 
structured component, a pseudo-random component, and a 
small component ( i.e. an error term); in many cases it is the 
structured component which then dominates. We illustrate 
this philosophy in a number of model cases. 



1. Introduction 

In many situations in combinatorics, one has to deal with 
an object of large complexity or entropy - such as a graph 
on N vertices, a function on N points, etc., with N large. 
We are often interested in the worst-case behaviour of such 
objects; equivalently, we are interested in obtaining results 
which apply to all objects in a certain class, as opposed to 
results for almost all objects (in particular, random or aver- 
age case behaviour) or for very specially structured objects. 
The difficulty here is that the spectrum of behaviour of an 
arbitrary large object can be very broad. At one extreme, 
one has very structured objects, such as complete bipartite 
graphs, or functions with periodicity, linear or polynomial 
phases, or other algebraic structure. At the other extreme 
are pseudorandom objects, which mimic the behaviour of 
random objects in certain key statistics (e.g. their correla- 
tions with other objects, or with themselves, may be close 
to those expected of random objects). 

Fortunately, there is a fundamental phenomenon that 
one often has a dichotomy between structure and pseudo- 
randomness, in that given a reasonable notion of structure 
(or pseudorandomness), there often exists a dual notion of 
pseudorandomness (or structure) such that an arbitrary ob- 
ject can be decomposed into a structured component and 
a pseudorandom component (possibly with a small error). 
Here are two simple examples of such decompositions: 



(i) An orthogonal decomposition f — /str + /psd of a 
vector / in a Hilbert space into its orthogonal pro- 
jection /str onto a subspace V (which represents the 
"structured" objects), plus its orthogonal projection 
/psd onto the orthogonal complement V-^ of V (which 
represents the "pseudorandom" objects). 

(ii) A thresholding / = /str + /psd of a vector /, where 
/ is expressed in terms of some basis vi, . . . ,Vn (e.g. 
a Fourier basis) as f — J2i<i<n '^^^i' '■^^ "structured" 
component /str '■— J2i-\ci\>\^i'^i contains the contri- 
bution of the large coefficients, and the "pseudoran- 



dom" component /psd := J2r\c \<\'^i^i contains the 
contribution of the small coefficients. Here A > 
is a thresholding parameter which one is at liberty to 
choose. 

Indeed, many of the decompositions we discuss here can 
be viewed as variants or perturbations of these two simple 
decompositions. More advanced examples of decomposi- 
tions include the Szemeredi regularity lemma for graphs 
(and hypergraphs), as well as various structure theorems re- 
lating to the Gowers uniformity norms, used for instance in 
fl6\, [TS^. Some decompositions from classical analysis, 
most notably the spectral decomposition of a self-adjoint 
operator into orthogonal subspaces associated with the pure 
point, singular continuous, and absolutely continuous spec- 
trum, also have a similar spirit to the structure-randomness 
dichtomy. 

The advantage of utilising such a decomposition is that 
one can use different techniques to handle the structured 
component and the pseudorandom component (as well as 
the error component, if it is present). Broadly speaking, 
the structured component is often handled by algebraic or 
geometric tools, or by reduction to a "lower complexity" 
problem than the original problem, whilst the contribution 
of the pseudorandom and error components is shown to be 
negligible by using inequalities from analysis (which can 
range from the humble Cauchy-Schwarz inequality to other, 
much more advanced, inequalities). A particularly notable 
use of this type of decomposition occurs in the many dif- 



ferent proofs of Szemeredi's theorem ll24ll : see e.g. Il30l for 
further discussion. 

In order to make the above general strategy more con- 
crete, one of course needs to specify more precisely what 
"structure" and "pseudorandomness" means. There is no 
single such definition of these concepts, of course; it de- 
pends on the application. In some cases, it is obvious what 
the definition of one of these concepts is, but then one has 
to do a non-trivial amount of work to describe the dual con- 
cept in some useful manner We remark that computational 
notions of structure and randomness do seem to fall into 
this framework, but thus far all the applications of this di- 
chotomy have focused on much simpler notions of structure 
and pseudorandomness, such as those associated to Reed- 
Muller codes. 

In these notes we give some illustrative examples of this 
structure-randomness dichotomy. While these examples are 
somewhat abstract and general in nature, they should by no 
means be viewed as the definitive expressions of this di- 
chotomy; in many applications one needs to modify the ba- 
sic arguments given here in a number of ways. On the other 
hand, the core ideas in these arguments (such as a reliance 
on energy-increment or energy-decrement methods) appear 
to be fairly universal. The emphasis here will be on illus- 
trating the "nuts-and-bolts" of structure theorems; we leave 
the discussion of the more advanced structure theorems and 
their applications to other papers. 

One major topic we will not be discussing here (though 
it is lurking underneath the surface) is the role of ergodic 
theory in all of these decompositions; we refer the reader 
to ll30l for further discussion. Similarly, the recent ergodic- 
theoretic approaches to hypergraph regularity, removal, and 
property testing in ||3T1 . ^ will not be discussed here, in or- 
der to prevent the exposition from becoming too unfocused. 
The lecture notes here also have some intersection with the 
author's earlier article IZTll . 

2. Structure and randomness in a Hilbert space 

Let us first begin with a simple case, in which the objects 
one is studying lies in some real finite-dimensional Hilbert 
space H, and the concept of structure is captured by some 
known set S of "basic structured objects". This setting is 
already strong enough to establish the Szemeredi regularity 
lemma, as well as variants such as Green's arithmetic reg- 
ularity lemma. One should think of the dimension of H as 
being extremely large; in particular, we do not want any of 
our quantitative estimates to depend on this dimension. 

More precisely, let us designate a finite collection S C 
H of "basic structured" vectors of bounded length; we as- 
sume for concreteness that \\v\\h < 1 for all v G S. We 
would like to view elements of H which can be "efficiently 
represented" as linear combinations of vectors in S as struc- 



tured, and vectors which have low correlation (or more pre- 
cisely, small inner product) to all vectors in S as pseudo- 
random. More precisely, given f G H, we say that / is 
(Af , K)-structured for some Af, K > Qif one has a decom- 
position 

/ = CiV, 
l<i<M 

with V, e S and c, e [-K, K] for all 1 < z < M. We 
also say that / is e -pseudo random for some e > if we 
have I (/, ■y) H I < e for all v ^ S.lt is helpful to keep some 
model examples in mind: 

Example 2.1 (Fourier structure). Let be a Hamming 
cube; we identify the finite field F2 with {0, 1} in the usual 
manner. We let H be the 2" -dimensional space of functions 
/ : F2 R, endowed with the inner product 

and let S be the space of characters, 

where for each ^ G Fj , is the function (a;) (— 1)^'^. 
Informally, a structured function / is then one which can 
be expressed in terms of a small number (e.g. 0(1)) char- 
acters, whereas a pseudorandom function / would be one 
whose Fourier coefficients 

f{0-^{f,H)H (1) 

are all small. 

Example 2.2 (Reed-Muller structure). Let H be as in the 
previous example, and let 1 < fc < n. We now let 
S = S'fe(F^) be the space of Reed-Muller codes (-l)'f'(^), 
where P : Fj ^ F2 is any polynomial of n variables with 
coefficients and degree at most k. For fc = 1, this gives the 
same notions of structure and pseudorandomness as the pre- 
vious example, but as we increase fc, we enlarge the class 
of structured functions and shrink the class of pseudoran- 
dom functions. For instance, the function (xi, . . . , Xn) >— > 
(— l)^i2»<j^" ^'^^ would be considered highly pseudoran- 
dom when fc = 1 but highly structured for k > 2. 

Example 2.3 (Product structure). Let be a set of \V\ = n 
vertices, and let H be the -dimensional space of functions 
f : V X V ^ Ti., endowed with the inner product 

{f^9)H-=^ X f{v,w)g{v,w). 

Note that any graph G = (V, E) can be identified with an 
element of H, namely the indicator function 1^; : V xV ^ 
{0, 1} of the set of edges. We let S be the collection of 



tensor products {v,w) > 1a{v)1b{w), where A,B are 
subsets of V. Observe that 1 e will be quite structured if 
G is a complete bipartite graph, or the union of a bounded 
number of such graphs. At the other extreme, if G is an 
e-regular graph of some edge density Q < 5 < 1 for some 

< e < 1, in the sense that the number of edges between 
A and B differs from by at most e|A||i3| when- 
ever A,BcV with \A\, \B\ > en, then Ie - S will be 
0{e) -pseudorandom. 

We are interested in obtaining quantative answers to the 
following general problem: given an arbitrary bounded el- 
ement / of the Hilbert space H (let us say \\f\\H < 1 for 
concreteness), can we obtain a decomposition 

/ = /str + /psd + /err (2) 

where /str is a structured vector, /psd is a pseudorandom 
vector, and /err is some small error? 

One obvious "qualitative" decomposition arises from us- 
ing the vector space span(S') spanned by the basic struc- 
tured vectors S. If we let /str be the orthogonal projection 
from / to this vector space, and set /psd ■= f — /str and 
/orr := 0, then we have perfect control on the pseudoran- 
dom and error components: /psd is 0-pseudorandom and 
/err has norm 0. On the other hand, the only control on /str 
we have is the qualitative bound that it is {K, Af) -structured 
for some finite K, M < oo. In the three examples given 
above, the vectors S in fact span all of H, and this decom- 
position is in fact trivial! 

We would thus like to perform a tradeoff, increasing our 
control of the structured component at the expense of wors- 
ening our control on the pseudorandom and error compo- 
nents. We can see how to achieve this by recalling how 
the orthogonal projection of / to span(S') is actually con- 
structed; it is the vector v in span(S') which minimises the 
"energy" ||/ — vWf^ of the residual / — v. The key point is 
that if V G span(S') is such that f — v has a non-zero inner 
product with a vector w G S, then it is possible to move v 
in the direction w to decrease the energy ||/ — We can 
make this latter point more quantitative: 

Lemma 2.4 (Lack of pseudorandomness implies energy 
decrement). Let H, S be as above. Let f Cz H be a vec- 
tor with WfWfj < 1, such that / is not e-pseudorandom 
for some < e < 1. Then there exists v d S and 
c e [— such that \{f,v)\ > e and \\f — cv\W < 

Proof. By hypothesis, we can find v G S he such that 

1 if, v)\ > e, thus by Cauchy-Schwarz and hypothesis on 
S 

l>\\v\\H>\{f,v)\>e. 

We then set c ;= {f,v)/\\v\\'jf (i.e. cv is the orthogonal 
projection of / to the span of v). The claim then follows 
from Pythagoras' theorem. □ 



If we iterate this by a straightforward greedy algorithm 
argument we now obtain 

Corollary 2.5 (Non-orthogonal weak structure theorem). 

Let H, S be as above. Let f G H be such that ||/||_f/ < 1, 
and let < s < 1. Then there exists a decomposi- 
tion (O such that /str is l/s)-structured, /psd is s- 
pseudorandom, and /orr is zero. 

Proof. We perform the following algorithm. 

• Step 0. Initialise /str 0, /orr := 0, and /psd /■ 
Observe that ||/psd|lH < 1- 

• Step 1 . If /psd is £-pseudorandom then STOP. Oth- 
erwise, by Lemma 123] we can find v G S and c G 
[-l/£,l/£] suchthat ll/psd - cwIIIj < ll/psdH^ - 

• Step 2. Replace /psd by /psd — cv and replace /str by 
/str + cv. Now return to Step I. 

It is clear that the "energy" 1 1 /psd 1 1 decreases by at least 
£^ with each iteration of this algorithm, and thus this al- 
gorithm terminates after at most such iterations. The 
claim then follows. □ 

Corollarv l2.5l is not very useful in applications, because 
the control on the structure of /str are relatively poor com- 
pared to the pseudorandomness of /psd (or vice versa). One 
can do substantially better here, by allowing the error term 
/orr to be non-zero. More precisely, we have 

Theorem 2.6 (Strong structure theorem). Let H, S be as 

above, let e > 0, and let F : Z+ — > be an arbi- 
trary function. Let f G H be such that ||/||_ff < 1. Then 
we can find an integer M = Oi? e(l) and a decomposi- 
tion (|2|l where /str is {M, M)-structured, /psd is 1/ F{M)- 
pseudorandom, and /orr has norm at most e. 

Here and in the sequel, we use subscripts in the 0{) 
asymptotic notation to denote that the implied constant de- 
pends on the subscripts. For instance, Op.si^) denotes a 
quantity bounded by Cf.s, for some quantity Cf,s depend- 
ing only on F and e. Note that the pseudorandomness of 
/psd can be of arbitrarily high quality compared to the com- 
plexity of /str, since we can choose F to be whatever we 
please; the cost of doing so, of course, is that the upper 
bound on M becomes worse when F is more rapidly grow- 
ing. 

To prove Theorem l2.6l we first need a variant of Corol- 
lary 12.51 which gives some orthogonality between /str and 
/psd, at the cost of worsening the complexity bound on /str- 

Lemma 2.7 (Orthogonal weak structure theorem). Let H, S 
be as above. Let f G H be such that \\f\\H < 1, and let 
< £ < 1. Then there exists a decomposition ^ such that 
/str is {l/e^,Oe{l))-structured, /psd is e-pseudorandom, 
/orr is zero, and (/str, /psd)// = 0. 



Proof. We perform a slightly different iteration to that in 
Corollarv l2.5l where we insert an additional orthogonalisa- 
tion step within the iteration to a subspace V: 

• Step 0. InitiaHse V {0} and /on- := 0. 

• Step 1 . Set /sti to be the orthogonal projection of / to 

V, and /psd := / - /sti- 

• Step 2. If /psd is e-pseudorandom then STOP. Oth- 
erwise, by Lemma 12.41 we can find v ^ S and 

c G [— 1/e, 1/e] such that \{fpsd,v)H\ > £ and 
ll/psd - cvfij < ll/psdllff - e^- 

• Step 3. Replace V by span(y U {v}), and return to 
Step 1. 

Note that at each stage, H/psdUff is the minimum dis- 
tance from / to V. Because of this, we see that ||/psd|||f 
decreases by at least £^ with each iteration, and so this al- 
gorithm terminates in at most steps. 

Suppose the algorithm terminates in M steps for some 
M < Then we have constructed a nested flag 

m^VoCViC.C Vm 

of subspaces, where each Vi is formed from Vi-i by adjoin- 
ing a vector Vi in S. Furthermore, by construction we have 
I {fi, Vi) \ > £ for some vector fi of norm at most 1 which is 
orthogonal to Vi-i. Because of this, we see that Vi makes 
an angle of 0^(1) with Vi-i. As a consequence of this and 
the Gram-Schmidt orthogonalisation process, we see that 
vi, . . . ,Viisa well-conditioned basis of 1^, in the sense that 
any vector w ^ Wi can be expressed as a linear combination 
of vi, . . . ,Vi with coefficients of size 0£.i(||i/;||//). In par- 
ticular, since fsti has norm at most 1 (by Pythagoras' theo- 
rem) and lies in Vm, we see that fstr is a linear combination 
of vi,. . . ,vi\i with coefficients of size OM,e(l) = Oe{l), 
and the claim follows. □ 

We can now iterate the above lemma and use a pigeon- 
holing argument to obtain the strong structure theorem. 

Proof of Theorem \2. 61 We first observe that it suffices to 
prove a weakened version of Theorem 12.61 in which fstr is 
(Om,£ (1), Om^e (l))-structured rather than (A/, Af) struc- 
tured. This is because one can then recover the original 
version of Theorem 12.61 by making F more rapidly grow- 
ing, and redefining A/; we leave the details to the reader. 
Also, by increasing F if necessary we may assume that F 
is integer-valued and F{M) > M for all M. 

We now recursively define Mq := 1 and Mi := 
F{Mi-i) for all i > 1. We then recursively define 
/o, /i, . . . by setting /o := /, and then for each i > 1 
using Lemma 122] to decompose = /str.i + fi where 
/str.i is (OAf,(l), OAf,(l))-structured, and is 1/Mi- 
pseudorandom and orthogonal to fsti.i- From Pythagoras' 



theorem we see that the quantity is decreasing, and 

varies between and 1. By the pigeonhole principle, we can 
thus find 1 < i < l/e^+l such that < e^; 

by Pythagoras' theorem, this implies that ||/str,i||_ff < £■ 
If we then set fsn ■= fsu fi + ■ • . + /str,i-i, /psd := fi, 
feYT '■= fstr,i, and M := A/i_i, we obtain the claim. □ 

Remark 2.8. By tweaking the above argument a little bit, 
one can also ensure that the quantities fstr, /psd 7 /oir in The- 
orem |2.6| are orthogonal to each other We leave the details 
to the reader. 

Remark 2.9. The bound OF,e(l) on A/ in Theorem 12.61 is 
quite poor in practice; roughly speaking, it is obtained by 
iterating F about 0(l/£^) times. Thus for instance if F is 
of exponential growth (which is typical in applications), AI 
can be tower-exponential size in e. These excessively large 
values of A/ unfortunately seem to be necessary in many 
cases, see e.g. |8| for a discussion in the case of the Sze- 
meredi regularity lemma, which can be deduced as a conse- 
quence of Theorem l2.6l 

To illustrate how the strong regularity lemma works in 
practice, we use it to deduce the arithmetic regularity lemma 
of Green |13| (applied in the model case of the Hamming 
cube FJ). Let A be a subset of and let 1a : ^ 
{0, 1} be the indicator function. If V is an affine subspace 
(over F2) of F2 , we say that A is e-regular in V for some 
< e < 1 if we have 

for all characters e^, where Exevfix) '■— ^xev /(■^) 
denotes the average value of / on V, and Sy :— 
'Ex€V^a{x) = \An V\/\V\ denotes the density of A in 
V. The following result is analogous to the celebrated Sze- 
meredi regularity lemma: 

Lemma 2.10 (Arithmetic regularity lemma). / [7?]/ Let A C 

Fj and < e < 1. Then there exists a subspace V of 
codimension d — Oe(l) such that A is e-regular on all but 
£2'^ of the translates ofV. 

Proof. It will suffice to establish the claim with the weaker 
claim that A is 0(ei/'*)-regular on all but 0(^2'') of the 
translates of V, since one can simply shrink e to obtain the 
original version of Lemma l2.10l 

We apply Theorem 12.61 to the setting in Example 12.11 
with / :— \a, and F to be chosen later. This gives us an 
integer M — Op.ei^) and a decomposition 

1a = /str + /psd + /err (3) 

where fstr is (Af, Af)-structured, /psd is l/i^(Af)- 
pseudorandom, and ||/crr||// < £■ The function /str is a 
combination of at most A/ characters, and thus there exists 



a subspace V C F2 of codimension d < M such that /str 
is constant on all translates of V. 
We have 

E.6Fj|/cn.(x)P<e = e2'^|y|/|F^|. 

Dividing into 2^* translates y + VofV,we thus conclude 
that we must have 



E2;6j/+y|/crr(a;)P < Vs 



(4) 



on all but at most ^/e2'^ of the translates y + V. 

Let y + F be such that (|4|l holds, and let 6y+v be the 
average of A on ?/ + The function /str equals a constant 
value on y + V, call it Cy+y Averaging (O on ?/ + F we 
obtain 

(^y+y — Cy+V + Ej;gy+y/psd(a;) + ^xey+V fm{x). 

Since /psd(2;) is l/F(Af)-pseudorandom, some simple 
Fourier analysis (expressing ly+v as an average of char- 
acters) shows that 

|Exey+y/psd(a;)| < < 



\V\F{M) - F{M) 
while from ^ and Cauchy-Schwarz we have 

|Exea+v/orr(a;)| < e^^'* 

and thus 

2M 



(^j^+y = Cj,+y + O 



By (l3]i we therefore have 



F{M) 



lA{x)-5y+v = fpsd{x)+fcrv{x)+0 



0{e 



F{M) 



l/4^ 



Now let be an arbitrary character By arguing as before 
we have 



\^x^y+V Usd{x)e^{x)\ 



< 



and 



and thus 



F(M) 

\'Exey+vfcYi{x)e^{x)\ < e^/" 



'Exey+v{lAix) - Sy+v)e^{x) = O 



F(Af) 



For some applications of this lemma, see llT3l . A de- 
composition in a similar spirit can also be found in f5\, fl5\. 
The weak structure theorem for Reed-MuUer codes was also 
employed in ifTSl . llT4l (under the name of a Koopman-von 
Neumann type theorem). 

Now we obtain the Szemeredi regularity lemma itself. 
Recall that if G = is a graph and A,B are non- 

empty disjoint subsets of V , we say that the pair (A, B) is 
e-regular if for any A' d A, B' <Z B with \ A'\ > e\A\ and 
\B'\ > e\B\, the number of edges between A' and B' differs 

from Ja.bI^'IIS'I by at most e|A'||B'|, where ^A.B = \En 
{A X is the edge density between A and B. 



Lemma 2.11 (Szemeredi regularity lemma). 1124V Let < 

£ < 1 and m > 1. Then if G ^ (V, E) is a graph with 
\V\ = n sufficiently large depending on e and m, then there 
exists a partition = Vb U Vi U . . . U Vm' with m < m' < 
Oe,m,{^) such that |Vb| < sn, \Vi\ = ... = \Vm'\, and 
such that all but at most s(rn')'^ of the pairs (Vi, Vj) for 
I < i < j < m' are e-regular 

Proof. It will suffice to establish the weaker claim that 
|Vo| = 0{en), and all but at most 0{^/e{'m')^) of the pairs 
(F„ Vj) are 0(ei/i2)-regular We can also assume without 
loss of generality that e is small. 

We apply Theorem l2.6l to the setting in Example l2.3l with 
f := 1e and F to be chosen later This gives us an integer 
M ^ Of,s{^) and a decomposition 



Ifi — ./str + /i 



psd 



/c: 



(5) 



If we now set F(Af) := e ^/^2^^ we obtain the claim. □ 



where /str is (A/, Af )-structured, /psd is 1/F{M)- 
pseudorandom, and ||/crr||// < £■ The function /str is a 
combination of at most M tensor products of indicator func- 
tions lAixBi- The sets Ai and Bi partition V into at most 
2^*^ sets, which we shall refer to as atoms. If \V\ is suffi- 
ciently large depending on M, m and e, we can then parti- 
tion F = Vo U . . . U Kn' with m <m' < lm + 2^*^) /e, 
I Vo I = 0{en), I Vi I = . . . = I Vm' \ , and such that each Vi for 
1 < i < m' is entirely contained within an atom. In particu- 
lar /str is constant on Vi x Vj for all 1 < i < j < m' . Since 
e is small, we also have = Q{n/m') for 1 < i < m. 
We have 

^(v,w)£Vxv\fcvi-{v,w)\'^ < S 

and in particular 

^l<i<j<m'^{v,w)£V^xVj\fcTr{v,w)\'^ = 0{e). 

Then we have 

E{v,w)£V,xV,\fcir{v,w)\'^ < ^/e (6) 

for all but 0(y^(to')^) pairs {i,j). 



Let {i, j) be such that ^ holds. On ViXVj, fsu- is equal 
to a constant value Cij. Also, from the pseudorandomness 
of /psd we have 



fpsd{v,w)\< 

{v,w)GA' xB' 



F{M) 



O 



m,e,M 



f \vm\ \ 

\ F{M) J 



for all A' C Vi and B' C Vj. By arguing very similarly 
to the proof of Lemma 12.101 we can conclude that the edge 
density 6ij of i? on x Vj is 

and that 

{v,w)eA' xB' 



F{M) 



mm 



for all A' C Vi and B' C Vj. This implies that the pair 
{Vi,Vj) is 0(ei/i2) + 0„,,,M (l/F(M)i/3)-regular. The 
claim now follows by choosing F to be a sufficiently rapidly 
growing function of M, which depends also on m and e. 

□ 

Similar methods can yield an alternate proof of the regu- 
larity lemma for hypergraphs mi. fT2l. lin. II22I; see Il29ll. 
To oversimplify enormously, one works on higher product 
spaces such as V x V x V, and uses partial tensor prod- 
ucts such as {vi,V2,V3) i— > 1a{vi)Ie{v2,V3) as the struc- 
tured objects. The lower-order functions such as Is {v2 , V3) 
which appear in the structured component are then decom- 
posed again by another application of structure theorems 
(e.g. for 1e{v2, V3), one would use the ordinary Szemeredi 
regularity lemma). The ability to arbitrarily select the var- 
ious functions F appearing in these structure theorems be- 
comes crucial in order to obtain a satisfactory hypergraph 
regularity lemma. 

See also [1] for another graph regularity lemma involv- 
ing an arbitrary function F which is very similar in spirit to 
Theorem 12.61 In the opposite direction, if one applies the 
weak structure theorem (Corollary 12.5b to the product set- 
ting (Example 12.3b one obtains a "weak regularity lemma" 
very close to that in @ . 

3. Structure and randomness in a measure 
space 

We have seen that the Hilbert space model for separat- 
ing structure from randomness is satisfactory for many ap- 
plications. However, there are times when the "L^" type 



of control given by this model is insufficient. A typical 
example arises when one wants to decompose a function 
/ : X ^ R on a probability space {X, X, /i) into struc- 
tured and pseudorandom pieces, plus a small error. Using 
the Hilbert space model (with H = L'^{X)), one can con- 
trol the norm of (say) the structured component fstr by 
that of the original function /, indeed the construction in 
Theorem |2.6| ensures that fsn is an orthogonal projection of 
/ onto a subspace generated by some vectors in S. How- 
ever, in many applications one also wants to control the L°° 
norm of the structured part by that of /, and if / is non- 
negative one often also wishes fsn to be non-negative also. 
More generally, one would like a comparison principle: if 
/, g are two functions such that / dominates g pointwise 
(i.e. \g{x)\ < f{x)), and /str and ^str are the corresponding 
structured components, we would like /str to dominate Estr- 
one cannot deduce these facts purely from the knowledge 
that /str is an orthogonal projection of /. If however we 
have the stronger property that fsn is a conditional expec- 
tation of /, then we can achieve the above objectives. This 
turns out to be important when establishing structure theo- 
rems for sparse objects, for which purely methods are 
inadequate; this was in particular a key point in the recent 
proof fllS I that the primes contained arbitrarily long arith- 
metic progressions. 

In this section we fix the probability space (X, X, /i), 
thus X is a cr-algebra on the set X, and /i : X ^ [0, 1] is a 
probability measure, i.e. a countably additive non-negative 
measure. In many applications one can assume that the cr- 
algebra X is finite, in which case it can be identified with a 
finite partition X ~ AiU . . .[J A^ of X into atoms (so that 
X consists of all sets which can be expressed as the union 
of atoms). 

Example 3.1 (Uniform distribution). If X is a finite set, 
X = 2^ is the power set of X, and n{E) \E\/\X\ 
for all £^ C X (i.e. /i is uniform probability measure on 
X), then {X, X, /i) is a probability space, and the atoms are 
just singleton sets. 

We recall the concepts of a factor and of conditional ex- 
pectation, which will be fundamental to our analysis. 

Definition 3.2 (Factor). A factor of (X, X,^) is a triplet 
Y ~ (y, Y, tt), where F is a set, Y is a cr-algebra, and 
TT : X ^ Y is a measurable map. If Y is a factor, we let 
By := {7r-i(£;) : E e Y} he the sub-a-algebra of X 
formed by pulling back Y by tt. A function f : X ^ R 
is said to be Y -measurable if it is measurable with respect 
to By. If / e L^{X,l^,ti), we let E(/|y) - E(/|6y) 
be the orthogonal projection of / to the closed subspace 
L'^{X, By, /^) of L^{X, X, /i) consisting of Y-measurable 
functions. If Y = (r,Y,7r) and Y' = (r',Y',7r') are 
two factors, we let Y V Y' denote the factor (F x F' , Y (g) 
Y',7r®7r'). 



Example 3.3 (Colourings). Let X be a finite set, which we 
give the uniform distribution as in Example 13.11 Suppose 
we colour this set using some finite palette Y by introduc- 
ing a map tt : X ^ Y. If we endow Y with the discrete 
cr-algebra Y = 2^, then (Y, Y, tt) is a factor of {X, X, /z). 
The (T-algebra is then generated by the colour classes 
TT^^{y) of the colouring tt. The expectation E(/|F) of 
a function / : X ^ R is then given by the formula 
E(/|y)(a:) := E^,e^-i(^(^))/(j;') for all x e X, where 
7r~^(7r(a;)) is the colour class that x lies in. 

In the previous section, the concept of structure was rep- 
resented by a set S of vectors. In this section, we shall 
instead represent structure by a collection S of factors. We 
say that a factor Y has complexity at most M if it is the 
join Y Yi V ... V Ym of m factors from S for some 
< TO < M. We also say that a function / e L'^i.X) 
is e-pseudorandom if we have ||E(/|Y)||i2(j(^-) < e for all 
Y G S. We have an analogue of Lemma 12. 41 

Lemma 3.4 (Lack of pseudorandomness implies energy in- 
crement). Let (X, X, p) andS be as above. Let f E L'^{X) 
be such that f — E(/|Y) is not e-pseudorandom for some 
< e < 1 and some factor Y. Then there exists Y' G 5 
such that ||E(/|Y V Y')|li.(;,) > ||E(/|Y)|ii.(^) + s\ 

Proof By hypothesis we have 

||E(/-E(/|Y)|Y')||i.(;,)>£^ 

for some Y' G S. By Pythagoras' theorem, this implies that 

||E(/-E(/|Y)|YVY')||i.(x)>e'- 

By Pythagoras' theorem again, the left-hand side is 
||E(/|YVY')||i.(;,)-|iE(/|Y)||i.(^), and the claim fol- 
lows. □ 

We then obtain an analogue of Lemma l277l 

Lemma 3.5 (Weak structure theorem). Let {X, X, /i) and 

S be as above. Let f G L'^{X) be such that W/Wl'^ix) ^ 1> 
let Y be a factor, and let < e < I. Then there exists a 
decomposition f = fsti- + /psd, where fstr = E(/| Y V Y') 
for some factor Y' of complexity at most and /psd '■s 
e-pseudorandom. 

Proof. We construct factors Yi, Y2, . . . , Y^ G 5 by the 
following algorithm: 

• Step 0: Initialise to = 0. 

• Step 1: Write Y' := Yi V . . . V Y„, /str := E(/|Y V 
Y'), and /p^d / - /str- 

• Step 2: If /psd is e-pseudorandom then STOP. Oth- 
erwise, by Lemma [3741 we can find Y,n+i G S such 
that ||E(/|Y V Y' V Y„,+i)lli2(x) > I1E(/|Y V 
Y')lli.m+e^. 



• Step 3: Increment to to m + 1 and return to Step L 

Since the "energy" ||/str||/,2(x) r^gcs between and 1 (by 
the hypothesis |1/||l2(J(^) < 1) and increments by at each 
stage, we see that this algorithm terminates in at most 
steps. The claim follows. □ 

Iterating this we obtain an analogue of Theorem l2.6l 

Theorem 3.6 (Strong structure theorem). Let {X, X, p) 

and S be as above. Let / G L^{X) be such that 
Wfh^-iX) < 1> let e > 0, and let F : Z+ ^ 11+ be 
an arbitrary function. Then we can find an integer M = 
Op.ei^) and a decomposition (|2|l where /str = E(/|Y)/or 
some factor Y of complexity at most M, /psd is 1/F{M)- 
pseudorandom, and /err has norm at most e. 

Proof Without loss of generality we may assume F{Af) > 
2M . Also, it will suffice to allow Y to have complexity 
0{M) rather than M. 

We recursively define Mq := 1 and Alt := F(A/i„i)^ 
for all z > 1. We then recursively define factors 
Yq, Yi, Y2, . . . by setting Yq to be the trivial factor, and 
then for each i > 1 using Lemma 12.71 to find a factor 
Y- of complexity at most Mi such that / - E(/|Yi_i V 
Y^) is l/F(A/i_i) -pseudorandom, and then setting Y^ 
Yi_i V Y^. By Pythagoras' theorem and the hypothesis 

Wfh^x) < 1, the energy \\'E{f\Y,)\\l2f^x} is increas- 
ing in i, and is bounded between and 1. By the pi- 
geonhole principle, we can thus find 1 < i < 1/e^ + l 
such that ||E(/|Y,)||i2(;,) - ||E(/|Y,_i)||i,(^) < s^; 
by Pythagoras' theorem, this implies that |lE(/|Yi) — 
E(/|Y,_i)|li2(x) < e. If we then set /str := E(/|Y,_i), 
/psd := / - E(/| Y,), /err E(/|Y,) - E(/| Y,_i), and 
M := Mi-i, we obtain the claim. □ 

This theorem can be used to give alternate proofs of 
Lemma 12.101 and Lemma 12.111 we leave this as an exer- 
cise to the reader (but see ll25l for a proof of Lemma 12.111 
essentially relying on Theorem l3.6l l. 

As mentioned earlier, the key advantage of these types 
of structure theorems is that the structured component /str 
is now obtained as a conditional expectation of the original 
function / rather than merely an orthogonal projection, and 
so one has good "L^" and control on /str rather than 

just control. In particular, these structure theorems are 
good for controlling sparsely supported functions f (such 
as the normalised indicator function of a sparse set), by ob- 
taining a densely supported function /str which models the 
behaviour of / in some key respects. Let us give a sim- 
plified "sparse structure theorem" which is too restrictive 
for real applications, but which serves to illustrate the main 
concept. 

Theorem 3.7 (Sparse structure theorem, toy version). Let 

< e < 1, let F : Z+ R+ be a function, and let N 



be an integer parameter. Let [X, X, /i) and S be as above, 
and depending on N. Let v G L^(X) be a non-negative 
function (also depending on N) with the property that for 
every M > 0, we have the "pseudorandomness" property 



|E(i^|Y)|Uoo(x) < 1 + om(1) 



(7) 



for all factors Y of complexity at most M, where om(1) 
is a quantity which goes to zero as N goes to infinity for 
any fixed AI. Let / : X ^ R (which also depends on N) 
obey the pointwise estimate < f{x) < v{x) for all x G 
X. Then, if N is sufficiently large depending on F and e, 
we can find an integer M = Oi? e(l) and a decomposition 
(|2|l where fst-c — E{f\Y) for some factor Y of complexity 
at most M, fpsd is 1/ F(M)-pseudorandom, and fen has 
norm at most e. Furthermore, we have 



0< /.tr(2:) < l + OF,e(l) 



and 



/str dn= f dn 



X 



(8) 



(9) 



An example to keep in mind is where X ~ {1, . . . , N} 
with the uniform probability measure /i, S consists of the 
cr-algebras generated by a single discrete interval {ri G Z : 
a < n <h} for 1 < a < 6 < A^, and v being the function 
v{x) = log A^lyi(x), where A is a randomly chosen subset 



of {!,..., iV} with^(xG A) 



logAf 



for all 1 < x < iV; 



one can then verify O with high probability using tools 
such as Chernoff 's inequality. Observe that v is bounded in 
L'^^X) uniformly in N, but is unbounded in L'^{X). Very 
roughly speaking, the above theorem states that any dense 
subset B of A can be effectively "modelled" in some sense 
by a dense subset of {1, ... , A^}, normalised by a factor of 
j^^^ ; this can be seen by applying the above theorem to the 
function / := log A^1b(x). 



Proof We run the proof of Lemma 13.51 and Theorem 
13.61 again. Observe that we no longer have the bound 
||/||L2(jf) < 1. However, from (|7l) and the pointwise bound 
< / < we know that 



l|E(/|Y)|U: 



{X) 



< 
< 

< 1 



E(i'|Y)|U^(^) 
om{1) 



for all Y of complexity at most M. In particular, for A^ 
large enough depending on AI we have 



(10) 



|E(/|Y)||2,(^j < 2 



(say). This allows us to obtain an analogue of Lemma [ 
as before (with slightly worse constants), assuming that A^ 
is sufficiently large depending on e, by repeating the proof 



more or less verbatim. One can then repeat the proof of 
Theorem l3.6l again using ( fTOl i. to obtain the desired decom- 
position. The claim ([8j follows immediately from Q, and 
(|9|l follows since J-^ E(/|Y) d/i = /x / for any factor 
Y. □ 

Remark 3.8. In applications, one does not quite have the 
property (|7]i; instead, one can bound E(i/| Y) by 1 + om(1) 
outside of a small exceptional set, which has measure o(l) 
with respect to /i and i'. In such cases it is still possible 
to obtain a structure theorem similar to Theorem 13.71 see 
ifTel Theorem 8.1], |26, Theorem 3.9], or Ol Theorem 4.7]. 
These structure theorems have played an indispensable role 
in establishing the existence of patterns (such as arithmetic 
progressions) inside sparse sets such as the prime numbers, 
by viewing them as dense subsets of sparse pseudorandom 
sets (such as the almost prime numbers), and then appeal- 
ing to a sparse structure theorem to model the original set 
by a much denser set, to which one can apply deep theorems 
(such as Szemeredi's theorem |24|) to detect the desired pat- 
tern. 

The reader may observe one slight difference between 
the concept of pseudorandomness discussed here, and the 
concept in the previous section. Here, a function /psd 
is considered pseudorandom if its conditional expectations 
E(/psd|Y) are small for various structured Y. In the pre- 
vious section, a function fpsd is considered pseudorandom 
if its correlations (./psd,.9)ff were small for various struc- 
tured g. However, it is possible to relate the two notions 
of pseudorandomness by the simple device of using a struc- 
tured function g to generate a structured factor Yg. In mea- 
sure theory, this is usually done by taking the level sets 
(;~^([a, 6]) of g and seeing what cr-algebra they generate. 
In many quantitative applications, though, it is too expen- 
sive to take all of these the level sets, and so instead one 
only takes a finite number of these level sets to create the 
relevant factor The following lemma illustrates this con- 
struction: 

Lemma 3.9 (Correlation with a function implies non-trivial 
projection). Let (X, X, fj.) be a probability space. Let f e 
L^{X) and g € L'^{X) be such that ^ 1 ^wc/ 

IIs'IIl^CX) 1^ 1- e > and < a < 1, and let Y be the 
factor Y — (R, Y, g), where Y is the a-algebra generated 
by the intervals [{n + a)e, (n + 1 + a)e) for n G Z. Then 
we have 

l|E(/|Y)|U.(x) > 

Proof Observe that the atoms of By are generated by level 
sets g^^{[{n + a)e, {n + 1 + a)e)), and on these level sets 
g fluctuates by at most e. Thus 



||5-E(g|Y)|U. 



(X) 



< e. 



Since < 1. we conclude 

|(/,.9>L^(x)-(/,E(g|Y))i.(x)| <e. 

On the other hand, by Cauchy-Schwarz and the hypothesis 
hh^x) < 1 we have 

\{fM9m)LHX)\ = mfm,9)L^x)\ 
< ||E(/|Y)|U.(x). 

The claim follows. □ 

This type of lemma is relied upon in the above- 
mentioned papers fT6l, 1*261, fST] to convert pseudorandom- 
ness in the conditional expectation sense to pseudorandom- 
ness in the correlation sense. In applications it is also conve- 
nient to randomise the shift parameter a in order to average 
away all boundary effects; see e.g. |i32j Lemma 3.6]. 

4. Structure and randomness via uniformity 
norms 

In the preceding sections, we specified the notion of 
structure (either via a set S of vectors, or a collection S 
of factors), which then created a dual notion of pseudoran- 
domness for which one had a structure theorem. Such de- 
compositions give excellent control on the structured com- 
ponent /str of the function, but the control on the pseudo- 
random part /psd can be rather weak. There is an opposing 
approach, in which one first specifies the notion of pseudo- 
randomness one would like to have for /psd, and then works 
as hard as one can to obtain a useful corresponding notion 
of structure. In this approach, the pseudorandom compo- 
nent /psd is easy to dispose of, but then all the difficulty 
gets shifted to getting an adequate control on the structured 
component. 

A particularly useful family of notions of pseudo- 
randomness arises from the Gowers uniformity norms 
Wfllv^iG)- These norms can be defined on any finite ad- 
ditive group G, and for complex-valued functions f : G ^ 
C, but for simplicity let us restrict attention to a Hamming 
cube G = F2 and to real-valued functions / : F2 ^ R. 
(For more general groups and complex-valued functions, 
see [33 1 . For applications to graphs and hypergraphs, one 
can use the closely related Gowers box norms; see fTTl . 
US, EQI, lEa, lOgi, [Ul.) in that case, the uniformity 
norm || / || c/<i(Fj) can be defined for d > 1 by the formula 

||/|1^*'.(P„) :=Ei^F^^Fj n /(^(«)) 

aeFg 



(not necessarily injective). For instance, we have 

\\f\\uHF^)^\E.MeF^f{x)f{x + h)\'/' 

= |E,eFj/(x)| 
|l/|lc;2(Fj) = \E,M,keF^fix)fix + h)f{x + k) 
X /(.T + /i + fc)|i/4 

= |E„eFj|E,eFj/(a;)/(x + /i)P|i/4 

ll/llc/3(Fj) = \^xMMMe-F':^f{x)f{x + hi)f{x + h2) 

X f{x + h^)f{x + hi + h2)f{x + hi+ h-s) 
X fix + h2 + ha) fix + hi+h2 + /la)!'/'. 

It is possible to show that the norms || ||[/d(Fj) are indeed a 
norm for d > 2, and a semi-norm for d = I; see e.g. I.33J . 
These norms are also monotone in d: 

< ||/||t/i(FJ) < WfWuHF^) < ll/llt/3(F?) < • • • < II/IIl-( 

The d = 2 norm is related to the Fourier coefficients /(^) 
defined in ([T]i by the important (and easily verified) identity 

ii/iic/^(F5)-(E (12) 

eeFj 

More generally, the uniformity norms ||/||[/<i(Fy) for d> I 
are related to Reed-Muller codes of order d — 1 (although 
this is partly conjectural for d > 4), but the relationship 
cannot be encapsulated in an identity as elegant as (fTZt once 
d> 3. We will return to this point shortly. 

Let us informally call a function / : F2 — > R pseu- 
dorandom of order d — 1 if ||/||[/d(Fj) is small; thus for 
instance functions with small f/^ norm are linearly pseu- 
dorandom (or Fourier-pseudorandom, functions with small 
norm are quadratically pseudorandom, and so forth. It 
turns out that functions which are pseudorandom to a suit- 
able order become negligible for the purpose of various 
multilinear correlations (and the higher the order of pseudo- 
randomness, the more complex the multilinear correlations 
that become negligible). This can be demonstrated by re- 
peated application of the Cauchy-Schwarz inequality. We 
give a simple instance of this: 

Lemma 4.1 (Generalised von Neumann theorem). Let 

Ti,T2 : — > F2 be invertible linear transformations such 
that Ti — T2 is also invertible. Then for any f,g,h: F^ — > 
[—1, 1] we have 

\E^,reF:^fix)gix + T^T)hix + T2r)\ < \\f\\uHF^)- 

Proof. By changing variables r' := if necessary we 
may assume that T2 is the identity map /. We rewrite the 
left-hand side as 



where L ranges over all affine-linear maps from F2 to Fj 



\E,^F^hix)EreF'ifix - r)gix + (Ti - I)r)\ 



and then use Cauchy-Schwarz to bound this from above by 

(E,eF5|E,eF5/(a; - r)g{x + (Ti - iWf'^ 
which one can rewrite as 

\-E,,ryeF^J{x-r)f{x-r')g{x+{Ti-I)r)g{x+{T^~iy)\^' 

applying the change of variables (y, s, h) := (a; + (Ti — 
I)r, Tir,r — r'), this can be rewritten as 

\Eyj,^F^g{y)g{y+iTi-I)h)i:seF^f{y+s)f{y+s+h)\'/^; 

applying Cauchy-Schwarz, again, one can bound this by 

|E^^^eFj|E,eFj/(2/ + s)fiy + s + h)\^\^^^ . 
But this is equal to ||/||!72(fj)^ and the claim follows. □ 

For a more systematic study of such "generalised von 
Neumann theorems", including some weighted versions, 
see Appendices B and C of 119 1. 

In view of these generalised von Neumann theorems, it 
is of interest to locate conditions which would force a Gow- 
ers uniformity norm ||/||[/'i(Fj) ^ be small. We first give 
a "soft" characterisation of this smallness, which at first 
glance seems too trivial to be of any use, but is in fact pow- 
erful enough to establish Szemeredi's theorem (see ll28l ) as 
well as the Green-Tao theorem [ 16 1. It relies on the obvious 
identity 

ll/fAFJ) = (/'^/)i^(FJ) 

where the dual function Vf of / is defined as 

Ei^p.^F„^i(o)^, n f{L{a)). (13) 

aeF^\{0} 

As a consequence, we have 

Lemma 4.2 (Dual characterisation of pseudorandomness). 

Let S denote the set of all dual functions VF with 
||i^||L~(Fj) < 1- Then if f : ^ [-1,1] is such 
that ||/||[/'i(Fj) ^ e for some < £ < 1, then we have 
if, g) > £^ for some g G S. 

In the converse direction, one can use the Cauchy- 
Schwarz-Gowers inequality (see e.g. lITOl . ll76l . llT9l . 
Il33l ) to show that if (/, g) > s for some g G S, then 

ll/llt/'^(F5) >£■ 

The above lemma gives a "soft" way to detect pseudo- 
randomness, but is somewhat unsatisfying due to the rather 
non-explicit description of the "structured" set S. To inves- 
tigate pseudorandomness further, observe that we have the 
recursive identity 

WfWu-iF^) - E,eFj||/M|?;;-\(Fj) (14) 



(which, incidentally, can be used to quickly deduce the 
monotonicity (fTTT)). From this identity and induction we 
quickly deduce the modulation symmetry 

ll/5llc/''(F5) = ll/llc/''(Fj) (15) 

! whenever g S S'd_i(F2) is a Reed-Muller code of order 
at most d — 1. In particular, we see that ||.g||[/ti(Fj) = 1 
for such codes; thus a code of order d — 1 or less is defi- 
nitely not pseudorandom of order d. A bit more generally, 
by combining ( fTSb with (fTTI) we see that 

l(/,5>L2(FS)| = ||/5l|(7i(FS) < ll/5ll(7'i(F5) = WfWu-'iF^)- 

In particular, any function which has a large correlation with 
a Reed-Muller code g e Sd-i{F2) is not pseudorandom of 
order d. It is conjectured that the converse is also true: 

Conjecture 4.3 (Gowers inverse conjecture for F2 ). Ifd > 

1 and £ > then there exists S > with the following 
property: given any n > I and any f : F2 [~ 1 , 1] 
with ||/||(7d(Fj) ^ £, there exists a Reed-Muller code g G 
S'(i_i(F2 ) of order at most d — 1 such that \ {f, 5)l2(f'>) | > 
6. 

This conjecture, if true, would allow one to apply the ma- 
chinery of previous sections and then decompose a bounded 
function / : F2 [~ 1 > 1] (or ^ function dominated by 
a suitably pseudorandom function u) into a function fstr 
which was built out of a controlled number of Reed-Muller 
codes of order at most d — 1, a function /psd which was 
pseudorandom of order d, and a small error. See for in- 
stance [14J for further discussion. 

The Gowers inverse conjecture is trivial to verify for d = 
1. For d = 2 the claim follows quickly from the identity 
(fT2] i and the Plancherel identity 

ll/lli^(Fj) = E 1/(^)1'- 

«eFj 

The conjecture for d — 3 was first established by Samorod- 
nitsky ll23l . using ideas from |9| (see also ifTTll . 1331 for 
related results). The conjecture for d > 3 remains open; a 
key difficulty here is that there are a huge number of Reed- 
Muller codes (about 2^'" ' or so, compared to the di- 
mension 2" of L^(F2 )) and so we definitely do not have 
the type of orthogonality that one enjoys in the Fourier case 
d ~ 2. For related reasons, we do not expect any identity 
of the form ( fT2] i for d > 3 which would allow the very few 
Reed-Muller codes which correlate with / to dominate the 
enormous number of Reed-Muller codes which do not in 
the right-hand side. 

However, we can present some evidence for it here in the 
"99%-structured" case when e is very close to 1. Let us first 
handle the case when £ = 1: 



Proposition 4.4 (100%-structured inverse theorem). Sup- 
pose d> I and f : — >■ [—1, 1] is such that |i/|j[/''(Fj) — 
1. Then f is a Reed-Muller code of order at most d — 1. 

Proof. We induct on d. The case d — 1 is obvious. Now 
suppose that d > 2 and that the claim has already been 
proven for d — 1. If ||/||t/d(Fj) — 1' then from (fl4] i we 
have 

E/ieFj||//h|lt/<i-i(Fj) 1- 

On the other hand, from (fTTT l we have ||///i||;7<i-i(Fj) < 1 
for all h. This forces ||///i||[/d-i(Fj) — 1 for all h. By 
induction hypothesis, ffh must therefore be a Reed-Muller 
code of order at most d—2 for all h. Thus for every h there 
exists a polynomial P/j : F2 — > F2 of degree at most d ~ 2 
such that 

/(x + /i) = /(x-)(-l)^'^(^) 

for all X, /i e F2 . From this one can quickly establish by 
induction that for every < m < n, the function / is a 
Reed-Muller code of degree at most d — 1 on F™ (viewed 
as a subspace of Fj ), and the claim follows. □ 

To handle the case when e is very close to 1 is trickier 
(we can no longer afford an induction on dimension, as was 
done in the above proof). We first need a rigidity result. 

Proposition 4.5 (Rigidity of Reed-Muller codes). For every 
d > 1 there exists £ > with the following property: if 
n > 1 and f G Sd-i(F2) is a Reed-Muller code of order 
at most d — 1 such that E^eFi^ f{x) > 1 — £, then / = 1. 

Proof We again induct on d. The case d = 1 is obvious, so 
suppose d > 2 and that the claim has already been proven 
ford-1. IfE,eFj/(a:) > then E^epj |l-/(a;)| < e. 
Using the crude bound |1 - /M = 0(|1 - /| + |1 - fh\) 
we conclude that Ea;gFj|l — ffh{x)\ < 0(e), and thus 

E.eF5/A(x) >l-0{e) 

for every h G ¥2- But ffh is a Reed-Muller code of order 
d—2, thus by induction hypothesis we have ffh = 1 for 
all ft, if e is small enough. This forces / to be constant; but 
since / takes values in {—1, +1} and has average at least 
1 — £, we have / = 1 as desired for e small enough. □ 

Proposition 4.6 (99%-structured inverse theorem). [2 J For 

every d > 1 and < e < 1 there exists < S < 1 with the 
following property: ifn>l and f : Fj [—1,1] is such 
that |j/||;7£i(Fj) > 1 — 5, then there exists a Reed-Muller 
code g G Sd^iiF'^) such that (/, g)L'^{F'^) > 1 - £■ 

Proof. We again induct on d. The case d = 1 is obvious, so 
suppose d > 2 and that the claim has already been proven 
for d—1. Fix e, let Shea small number (depending on d and 
e) to be chosen later, and suppose / : Fj — > [— 1, 1] is such 
that ||/||[/''(Fj) ^ ^ ^ We will use o(l) to denote any 



quantity which goes to zero as 5 ^ 0, thus ||/|l;7<i(Fj) ^ 
1 — 0(1). We shall say that a statement is true for most 
x G F2 if it is true for a proportion 1 — o(l) of values 
X G F^. 

Applying (fT4] i we have 

Eh.eFj||/A.||c/rf(Fj) > l-o(l) 

while from (fTTT l we have \\f fh\\u''{Fi}) ^ 1- Thus we have 
||//;i||[/d(Fj) = 1 — 0(1) for all ft in a subset H of F2 of 
density 1 — o(l). Applying the inductive hypothesis, we 
conclude that for all ft G -ff there exists a polynomial Ph : 
F2 F2 of degree at most d—2 such that 

E.eFj/(x)/(a; + ft)(-l)^''(-) > 1 - o(l). 

Since / is bounded in magnitude by 1, this implies for each 
h e H that 

/(.T + ft) = /(x)(-l)^''(-)+o(l) (16) 

for most X. For similar reasons it also implies that | = 
1 + 0(1) for most X. 

Now suppose that fti, ft2, fts, ft4 G H form an additive 
quadruple in the sense that fti + ft2 = fts + ft4. Then from 
(fTSI l we see that 

/(a; + fti+ft2) =/(x)(-l)^''i(")+^'-2(-+^i)+o(l) (17) 

for most X, and similarly 

/(.T + fts + ft4) = /(x)(-1)^''3(^)+^'m(=^+''3) + 0(1) 

for most a;. Since = l+o(l) for most x, we conclude 

that 

(^^l'^Phi{x)+Ph2ix+hi)-Ph3{x)-Phiix+h3) ^ 2 +0(1) 

for most X. In particular, the average of the left-hand side in 
X is 1 — 0(1). Applying Lemma l43] (and assuming 6 small 
enough), we conclude that the left-hand side is identically 
1, thus 

PhA^) + Ph^A^ + hi) ^ Ph,{x) + Ph,{x + hs) (18) 
for all additive quadruples fti + ft2 = fts + ft4 in H and all 

X. 

Now for any k G Fj, define the quantity Q{k) G F2 by 
the formula 

Q{k) := PhAO) + PhAhi) (19) 

whenever fti, ft2 G H are such that fti + ft2 G H. Note that 
the existence of such an fti , ft2 is guaranteed since most ft 
lie in H, and ^ ensures that the right-hand side of (fT9] l 
does not depend on the exact choice of fti , ft2 and so Q is 
well-defined. 



Now let X G F2 and h G H. Then, since most elements 
of F2 lie in H, we can find ri,r2, si, S2 E H such that 
ri + r2 = X and si + S2 = a; + h. From (fTTT i we see that 

f{y+x) = f{y+ri+r2) = /(y)(-l)^-(f)+^-(^+'-^)+o(l) 
and 

f{y+x+h) = /(y+,si+,s2) = /(j/) (-1)^=1 (^)+^=^(^+^i^o( 

for most y. Also from ( fTSI l 

/(y + :r + /i) = /(y + + o(l) 

for most y. Combining these (and the fact that |/(y)| = 
1 + 0(1) for most y) we see that 

(^^l-^Psiiy)+Ps2{y+si)''Priiv)-Pr2(.y+ri)-Ph{y+x) ^ 1+0(1) 

for most y. Taking expectations and applying Lemma 14.51 
as before, we conclude that 

PsM+PsAy+-^i)-PrAy)-PrAy+n)~Ph{y+x) = o 

for all y. Specialising to y = and applying ( fT9] l we con- 
clude that 

Ph{x) = Q{x + h)- Q{x) = Qh{x) - Q{x) (20) 

for all X £ F2 and h E H; thus we have succesfully "in- 
tegrated" Ph{x). We can then extend Ph{x) to all h e 
(not just h e H) by viewing (|20] | as a definition. Observe 
that if /i e F2 , then h = hi + h2 for some /ii, /i2 G H, and 
from (|20] l we have 

Ph{x)^PhAx)+PhAx + hi). 

In particular, since the right-hand side is a polynomial of 
degree at most d — 2, the left-hand side is also. Thus we 
see that Qh — Q is a polynomial of degree at most d~2 for 
all h, which easily implies that Q itself is a polynomial of 
degree at most d—1. If we then set g{x) := /(a;)(— 1)'5(^), 
then from ( fTSI l. ( l20l i we see that for every h E H we have 

gix + h) = g{x) + o{l) 

for most X. From Fubini's theorem, we thus conclude that 
there exists an a; such that g(x+/i) = y(a;)+o(l) formost /i, 
thus g is almost constant. Since |y(a;)| = 1 + o(l) for most 
X, we thus conclude the existence of a sign e E { — 1,+!} 
such that g{x) — e + o(l) for most x. We conclude that 

/(a;) = e(-l)«(-)+o(l) 

for most X, and the claim then follows (assuming S is small 
enough). □ 



Remark 4.7. The above argument requires ||/||;7<i(Fj) to be 
very close to 1 for two reasons. Firstly, one wishes to ex- 
ploit the rigidity property; and secondly, we implicitly used 
at many occasions the fact that if two properties each hold 
1 — 0(1) of the time, then they jointly hold 1 — o(l) of the 
time as well. These two facts break down once we leave 
the "99%-structured" world and instead work in a 
structured" world in which various statements are only true 
for a proportion at least e for some small e. Nevertheless, 
the proof of the Gowers inverse conjecture for d = 2 in 
I23I has some features in common with the above argument, 
giving one hope that the full conjecture could be settled by 
some extension of these methods. 

Remark 4.8. The above result was essentially proven in ||2l 
(extending an argument in f4l for the linear case d ~ 2), 
using a "majority vote" version of the dual function (fT3]) . 

5. Concluding remarks 

Despite the above results, we still do not have a system- 
atic theory of structure and randomness which covers all 
possible applications (particularly for "sparse" objects). For 
instance, there seem to be analogous structure theorems for 
random variables, in which one uses Shannon entropy in- 
stead of L^-based energies in order to measure complexity; 
see II25I . In analogy with the ergodic theory Uterature (e.g. 
fT\), there may also be some advantage in pursuing relative 
structure theorems, in which the notions of structure and 
randomness are all relative to some existing "known struc- 
ture", such as a reference factor Yq of a probability space 
{X, X, /i). Finally, in the iterative algorithms used above to 
prove the structure theorems, the additional structures used 
at each stage of the iteration were drawn from a fixed stock 
of structures (S in the Hilbert space case, S in the measure 
space case). In some applications it may be more effective 
to adopt a more adaptive approach, in which the stock of 
structures one is using varies after each iteration. A simple 
example of this approach is in ll32l . in which the structures 
used at each stage of the iteration are adapted to a certain 
spatial scale which decreases rapidly with the iteration. I 
expect to see several more permutations and refinements of 
these sorts of structure theorems developed for future appli- 
cations. 
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